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Abstract 


The  implicit  Cholesky  algorithm  has  been  developed  by  several  authors  during 
the  last  10  years  but  under  different  names.  We  identify  the  algorithm  with  a  special 
version  of  Rutishauser’s  LR  algorithm.  Intermediate  quantities  in  the  transformation 
furnish  several  attractive  approximations  to  the  smallest  singular  value. 

The  paper  extols  the  advantages  of  using  shifts  with  the  algorithm.  The  non- 
orthogonal  transformations  improve  accuracy. 

Key  words:  Cholesky  decomposition,  singular  values,  Singular  value  decompo¬ 
sition,  eigenvalues,  null  spaces,  noise  spaces,  URV  factorization,  ULV  factorization, 
QR  algorithm,  LR  algorithm,  Jacobi  methods 
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1  Introduction 


We  want  to  recommend  a  new  way  to  compute  singular  values  and  vectors  of  triangular 
matrices. 

One  of  the  ideas  behind  our  algorithm  is  not  new.  The  flipping  of  a  triangular  matrix 
between  upper  and  lower  forms,  using  orthogonal  transformations  exclusively,  first  from 
one  side  and  then  from  the  other  has  become  a  popular  activity  among  signal  processors 
recently,  see  [13],  [4].  The  end  product  is  either  the  SVD  or  the  updating  of  a  noise 
subspace.  Mathias  and  Stewart  in  [13],  used  this  see-saw  procedure  in  the  refinement 
step  of  the  URV  factorization.  Chandrasekaran  and  Ipsen  in  [2]  looked  at  this  flip-flop 
technique  as  the  basic  QR  algorithm  and  Ammann  in  [1]  goes  so  far  <ls  to  call  it  the 
transpose  QR  algorithm.  This  name  is  a  bit  misleading. 

As  long  ago  as  1968  Faddeev  et  al,  in  [5]  recognized  that  one  such  transformation  of  a 
triangular  matrix  is  equivalent  to  one  step  of  the  Cholesky  LR  algorithm  of  Rutishauser: 
L  =  QR  implies  VL  =  R*R.  However  the  way  R  is  produced  from  L  is  quite  differ¬ 
ent  from  Rutishauser ’s  transformation  of  LV  to  R^R  and  this  suggests  to  us  that  the 
proper  name  for  the  triangular  flipping  procedure  is  the  implicit  Cholesky  LR  algorithm 
or,  better,  the  implicit  Cholesky  algorithm  since  LR  is  a  bit  redundant.  We  plan  to 
discuss  the  implicit  LR  algorithm  for  nonsymmetric  eigenvalue  problems  in  another  com¬ 
munication.  Our  naming  conventions  are  consistent  with  past  usage;  QR  factorization  is 
the  basis  for  the  QR  algorithm.  Similarly  Cholesky  factorization  leads  to  the  Cholesky 
algorithm.  The  interesting  dissertation  [22]  adopts  this  terminology. 

Since  two  steps  of  the  unshifted  Cholesky  algorithm  are  equivalent  to  one  step  of  the 
unshifted  QR  algorithm  it  is  clear  how  QR  comes  into  the  picture. 

What  seems  to  be  a  novel  contribution  of  this  paper  is  to  show  how  to  combine  the  use 
of  shifts  and  the  calculation  of  either  right  or  left  singular  vectors  but  not  both.  In  many 
applications  this  suffices.  Our  Algorithm  5  only  delivers  the  right  singiilar  vectors. 

We  discuss  convergence  to  show  how  eeisy  the  argument  is  in  the  unshifted  case  and  to 
sharpen  the  idea  of  Rutishauser  for  getting  a  very  accurate  shift  from  a  case  of  late  failure 
in  the  shifted  version. 

We  do  not  give  a  formal  error  analysis  here  because  the  backward  stability  results  for 
the  unshifted  case  extend  to  our  shifted  algorithm.  In  fact  our  accuracy  (absolute,  not 
relative)  is  better  than  the  standard  procedures  in  practice  but  we  have  not  proved  this 
property  yet.  Our  shifted  algorithms  do  not  preserve  the  Frobenius  norm  they  actually 
decrease  it.  This  improves  speed  and  accuracy. 

There  is  some  evidence  that  the  connection  between  triangular  flipping  and  either  QR 
or  Cholesky  is  not  widely  appreciated.  The  strongest  piece  of  evidence  is  the  absence  of 
shifting.  Experts  are  happily  extoUing  the  flip-flop  of  triangular  matrices  who  would  not 
be  caught  dead  using  QR  or  LR  algorithms  without  appropriate  shifts.  It  is  as  though  an 
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interdiction  has  been  placed  on  using  anything  other  than  orthogonal  transformations  for 
SVD  calcxilations.  In  [7]  we  showed  that  for  high  relative  accuracy  in  treating  bidiagonal 
matrices  it  is  advisable  to  abandon  plane  rotations.  Consequently  orthogonal  transfor¬ 
mations  are  sufficient  but  by  no  means  necessary  for  obtaining  accurate  approximations. 

The  numerical  results  suggest  to  us  that  the  current  triangular  flipping  algorithms  are 
popular  because  little  accuracy  is  required  and  the  snail  hke  pace  of  linear  convergence 
does  not  hurt  much.  With  sensible  shifts  the  implicit  Cholesky  algorithm  can  be  used 
like  a  LAPACK  routine  to  yield  singular  values  with  maximal  accuracy  in  an  absolute 
sense  {macheps  ^  norm). 

Here  we  present  the  implicit  Cholesky  algorithm  with  shifts  and  make  a  plea  for  its 
adoption,  at  least  by  those  who  are  in  a  hurry.  New  results,  Theorems  1  and  3,  show 
that  the  intermediate  quantities  produced  by  the  algorithm  yield  valuable  information 
for  shift  selection. 


2  Notation 


Householder  conventions  are  followed:  capital  letters  for  matrices,  Greek  letters  for 
scalars,  lower  case  Roman  for  vectors.  We  use  the  Euclidean  vector  norm  \\x\\^  =  x*x  and 
the  spectral  matrix  norm  (|F||  =  ai[F]  throughout  the  paper  where  denotes  the  ith 
singular  value.  Similarly,  A,*[.]  signifies  the  ith  eigenvalue.  We  use  x*  for  the  conjugate 
transpose  of  x  but  since  most  of  our  quantities  are  real  x*  also  denotes  the  transpose. 

The  columns  of  the  identity  matrix  are  denoted  by  e:  /  =  (ei,e2, . .  .,e„). 


3  Uses  of  Triangular  Form 


If  one  wants  to  compute  the  SVD  of  a  single  matrix,  the  most  efficient  and  accurate 
method  available  appears  to  be  the  shifted  differential  qd  algorithm  of  Fernando  and 
Parlett  which  is  a  special  case  of  the  implicit  LR  algorithm.  See  [7].  However  this 
requires  the  bidiagonalization  of  the  matrix  using  orthogonal  transformations.  There  are 
many  occasions  when  this  bidiagonalization  is  not  warranted.  For  example,  in  signal 
processing  applications  this  reduction  is  not  advised  since  new  data  has  to  be  brought  in 
and  old  data  has  to  be  removed.  Triangular  matrices  are  a  convenient  form  in  this  case. 
See  for  example  [14]  and  [20]. 

Although  the  SVD  is  the  most  reliable  decomposition  to  obtain  the  rank  of  a  matrix, 
it  is  often  an  overkill.  That  is  why  intermediate  non-canonical  decompositions  such  jls 
the  URV  and  ULV  factorizations  of  Stewart  have  many  applications,  especially  in  signal 
processing  and  statistics.  If  one  requires  only  the  smallest  singular  values  or  the  singular 
vectors  corresponding  to  the  smallest  singular  values,  the  triangular  structure  is  often 
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adequate  to  obtain  these  rank  revealing  properties.  In  addition  the  condition  number  is 
revealed  very  quickly  with  our  shifted  implicit  Cholesky  algorithm. 

The  reduction  to  bidiagonaJ  form  may  not  be  needed  if  the  matrix  already  has  small 
off-diagonal  elements  and  the  diagonal  entries  are  monotone  decreasing  or  nearly  so. 

The  flipping  process  can  be  also  used  as  a  preconditioner  for  other  SVD  algorithms. 
Veselic  and  Hari  used  one  step  of  the  Cholesky  algorithm  with  pivoting  as  an  effective 
preconditioner  for  the  one-sided  Jacobi  algorithm  of  Hestenes.  Lacking  the  implicit  form 
of  the  algorithm  they  felt  obliged  to  limit  themselves  to  one  preconditioning  step.  How¬ 
ever,  by  using  the  flipping  process  many  preconditioning  steps  can  be  used. 

If  the  matrix  is  banded  and  triangular,  then  the  flipping  process  preserves  bandwidth. 
This  comes  from  the  well  known  fact  that  the  LR  algorithm  preserves  the  bandwidths 
of  matrices.  Since  bidiagonalization  of  banded  matrices  is  relatively  expensive,  implicit 
Cholesky  is  a  convenient  method  to  And  the  SVD  or  the  approximate  null  space  of  a 
banded  matrix. 


4  The  Implicit  Cholesky  Algorithm 


Our  goal  here  is  to  present  neglected  implementations  of  the  Cholesky  LR  algorithm  that 
avoid  the  two  defining  steps  of  this  classical  method  presented  as  Algorithm  1. 

Algorithm  1  (Standard  Cholesky  to  diagonalize  symmetric  positive  definite  A) 

Ao  :=  A 

For  t  =  0, 1, 2, . . .  until  converged 

(a)  Compute  the  Cholesky  factorization  LiL^  :=  A, 

(b)  A.+:  ^  LTL, 


Here  i,-  is  lower  triangular  and  X*  is  its  conjugate  transpose. 

In  exact  arithmetic  as  t  ^  oo  ,  X,-  — >  £,  the  diagonal  matrix  of  (often,  ordered)  singular 
values  of  Lq.  Moreover,  A,-  — »•  the  diagonal  matrix  of  ordered  eigenvalues  of  A.  In 

floating-point  arithmetic,  the  computed  eigenvalues  will  be  accurate  in  an  absolute  sense 
but  the  relative  accuracy  of  smaller  singular  values  may  be  poor. 

The  overlooked  fact  is  that  there  is  no  need  to  form  A,+i  in  order  to  obtain  its  Cholesky 
factor  Xf+i.  Instead  consider  the  QR  factorization  of  X,-,  say 

X,'  =  QiRi 

where  Q;  =  Qr^  and  R,  is  upper  triangular  with  positive  diagonal. 


X.+i  =  R* 


Lemma  1 


Proof:  By  step  (b)  of  the  algorithm 


A.+1  =  i*i.-  =  RiQlQiRi  =  R]Ri 


By  step  (a) 

and  the  result  follows  from  the  uniqueness  of  the  Cholesky  factorization.  • 


There  is  considerable  computational  advantage  in  not  forming  the  matrices  Ai . 

Comment  It  seems  likely  that  Rutishauser  would  have  discovered  Lemma  1  (via  the 
qd  algorithm,  see  [16],  [18],  [17],  [19])  but  we  can  find  no  explicit  mention  of  this  fact. 

A  defect  of  the  traditional  LR  algorithm  is  that  one  cannot  obtain  either  eigenvectors 
of  A  or  the  singular  vectors  of  Lq  directly.  However  the  new  implementation  provides  a 
natural  mechanism  for  obtaining  those  vectors,  as  shown  below.  To  see  the  idea  write 
out  the  relation  of  Lj  to  -^oi 


Li  —  L\Qi  —  Q*qLqQ\. 


Similarly, 


Li  —  L^Qs  —  ■  •  •  —  QiQoLoQiQs- 


We  now  describe  the  new  implementation  formally. 


Algorithm  2  (Implicit  Cholesky  SVD  of  L) 
SetLo  =  L,U  =  I,V  =  I 
For  i  =  1,2,...  until  converged 

(a)  Compute  the  QR  factorization  Li  =  QiRi, 

(b)  U  *—  UQi  for  odd  i,  V  ■*—  VQ,  for  even  i 

(c)  Li+i  <—  (This  step  is  purely  formal) 


Remark  1  It  is  not  necessary  to  form  Qi  explicitly  to  obtain  R,  (=  Xt+j). 

Remark  2  On  exit  the  squares  of  X’s  diagonals  give  the  eigenvalues  of  XoXJ(=  Aq). 

Remark  3  Also,  on  exit,  U  holds  the  left  singular  vectors  of  X,  the  normalized  eigenvectors 
of  LqLq.  U  U  is  not  required,  that  part  of  the  algorithm  may  be  omitted. 

Remark  4  If  the  right  singular  vector  matrix  V  is  not  wanted,  that  part  of  the  algorithm 
may  be  omitted. 

Remark  5  If  the  matrix  Ao  is  positive  semi-definite  ,  then  the  traditional  LR  algorithm 
could  break  down.  However,  the  Iihplicit  Cholesky  algorithm  exists  even  when  Xo  is 
not  full  rank.  Diagonal  pivoting  cures  the  defect.  See  Section  5. 
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Remark  6  The  identification  of  the  basic  algorithm  with  unshifted  QR  (see  next  section) 
indicates  that  the  process  is  rather  slow.  Nevertheless  if  L  nearly  reveals  its  rank  then 
only  a  few  steps  are  needed  to  obtain  a  basis  for  the  approximate  null  space. 

Remark  7  Since  Cholesky  factorization  preserves  band  structure  and  so  does  reverse  mul¬ 
tiplication  of  the  factors  it  is  clear  that  Algorithm  1  preserves  bandwidth.  Consequently 
Algorithms  2  and  3  enjoy  the  same  property. 


5  The  Relation  of  QR  to  Implicit  Cholesky 


It  is  well  known  -  to  experts  -  that,  for  a  positive  definite  Hermitian  matrix,  two  steps  of 
the  LR  algorithm  produce  the  same  matrix  as  one  step  of  the  zero-shift  QR  algorithm. 
For  completeness  we  show  this  well  known  relationship  which  is  hard  to  find  in  the 
literature. 


Algorithm  3  (Zero-shift  QR  algorithm  for  Hermitian  M) 

Mo  =  M. 

For  i  =  0,1,2,...  until  converged 

(a)  Compute  the  QR  factorization  Mi  =  ZiTi, 

(b)  ^  TiZi  . 


Theorem  1  One  step  of  the  QR  algorithm  with  zero  shifts  is  equivalent  to  two  steps  of 
the  unshifted  Implicit  Cholesky. 


Proof:  Set  Mq  =  Aq  =  LoL^.  By  Algorithm  2, 


Mo  —  LoL^o  —  QoLiLq. 


By  uniqueness  of  the  QR  factorization  Zo  =  Qo  and  To  =  XJXJ  and  so,  from  proof  of 
Lemma  1, 

X2X2  “  ^qXqXq^o  ”  ZqMoZo  —  Ml. 


Similarly, 


A2j+2  —  X2j+2X2j+2  —  Qlj^^3^\jQ7i  —  ZfMjZj  — 


6  Incorporation  of  Pivoting 


It  is  recognized  that  pivoting  is  useful  for  improving  convergence  of  the  basic  LR  algorithm 
[18],  [17].  In  the  case  of  the  classical  LR  algorithm  pivoting  is  used  to  bring  the  large 
diagonal  elements  to  the  top  of  the  matrix  and  the  small  diagonal  values  to  the  bottom 
of  the  matrix. 
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For  Algorithm  2,  we  can  accomplish  this  by  using  the  pivoted  QR  factorization  with 
column  interchanges, 

Li  =  QiRiPi 

where  P,-  is  a  permutation  matrix.  Any  rank  revealing  Pi  could  be  used. 

The  relations  are  then 

up:  =  Q,LU, 

We  note  that  pivoting  is  dilRcult  in  many  modern  machines  and  in  some  novel  architec¬ 
tures  this  could  be  prohibitively  expensive.  However,  we  do  not  expect  pivoting  to  be 
needed  except  in  the  first  or  the  second  sweep  of  the  algorithm  for  most  matrix  problems. 


7  Bounds  for  a  min 


Useful  bounds  may  be  attained  at  little  cost  by  exploiting  intermediate  quantities  that 
occur  in  one  step  of  the  unshifted  implicit  Cholesky  transform,  L  =  QL*, 

Consider  an  intermediate  stage  in  the  transformation  of  Z  to  Z  where  Q*L  =  Z*.  Al¬ 
though  Q  is  unique  it  may  be  obtained  as  a  product  in  many  different  ways.  However 
the  choice  narrows  down  when  it  is  desirable  to  take  advantage  of  the  triangular  nature 
of  Z. 


The  first  stage  of  the  reduction  Z  to  L*  is  typical.  Apply  a  sequence  of  (ti  —  1)  plane 
rotations  in  the  planes  (l,j)  ,  j  =  2,...,n  to  map  column  1  of  Z  into  a  multiple  of 
ei  =  (1, 0, . . . ,  0)*.  Call  the  product  of  these  rotations  GJ,  then 


G\L  = 


\M  e\t 

0  Z(2) 


When  the  plane  rotations  are  taken  in  the  natural  order  indicated  above  then  the  “reduced 
matrix”  Z^^^  must  be  lower  triangular  and  may  be  transformed  the  same  way.  Note  that 
use  of  a  Householder  reflector  instead  of  Gi  would  destroy  the  triangular  form  of  Z^^\ 


Definition  The  (1,1)  entry  of  the  reduced  matrix  Z^^^  is  denoted  by  Note  that 
=  ^1,1-  • 


Consider  an  intermediate  stage  in  the  reduction. 

X 

X 

X 

X 

X 

X 

X  X  X  X  X 

X  X  X  X 

dk 

X  X 

XXX/ 

Note  that  row  k  of  the  matrix  is  a  singleton. 


(1) 
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Theorem  1  Trnnsform  nxn  invertible  lower  triangular  L  to  upper  triangular  L*  by  the 
triangle  preserving  algorithm  indicated  in  equation  (1).  Then 


(a)  <T„[i]  <  mint  dt  , 

(b)  [iL^L)-%,  =  d;^ , 

(c)  <  <L]  . 

Proof:  The  key  fact  is  that,  for  each  k, 

e:Gl_,...GlL  =  d,el  (2) 

Since  singular  values  are  invariant  under  unitary  equivalences 

a^[L]  =  (t„[GI.,  . .  .G\L]  <  WelGU  •  •  -Gim  =  d„ 
yielding  (a).  Next  transpose  (2)  and  rearrange: 

Gi . .  .Gk-iCkd^^  =  L~*et., 

dk^  =  dl^elGX_, . .  .G\Gk .  ..Gk-iekdl^  =  elL-^L^ek, 
yielding  (b).  Finally  summing  the  last  equation  over  k  gives 

(r~^  <  =  \\L~^\\f  =  trace[(i*i)-i]  = 

»=1  *=1 

which  is  the  last  inequality  in  (3).  The  first  inequality  holds  for  any  set  of  positive 
numbers:  ||n||2  <  ||t^||i.  • 

Remark  By  Theorem  2  we  may  use  r  =  as  a  shift  with  no  fear  of  breakdown, 

is  the  Newton  shift  from  0  towards  The  reason  is  well  known. 


Theorem  2  Let  A  be  symmetric  positive  definite.  The  Newton  approximation  to  A,„,„[y4] 
from  0  is  lltrace{A~^). 


Proof:  If  x(t)  is  A's  characteristic  polynomial  then 

-x'(0)/x(0)  =  X^Y~  n  “  trace(yl~^) , 
where  x'(0  denotes  the  derivative  of  x(i)  with  respect  to  t.  • 
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8  Incorporating  Shifts 


Given  L  and  t  <  Cmin[L]  one  seeks  X,  lower  triangular  with  positive  diagonal,  such  that 


n-T^i  =  it. 


Take  the  shift  to  the  other  side  and  observe  that  it  is  equivalent  to  compute  an  orthogonal 
2n  X  2n  matrix  Q  such  that 


Let  m.j  denote  column  j  of  M.  In  describing  the  algorithm  fij  denotes  the  current  value 
of  the  (i,  j)  entry  of  F,  not  the  original  one.  Imagine  that  a  square  array  F  is  initialized 
to  contain  L  and  ends  up  holding  t. 


Algorithm  4  (The  Implicit  Cholesky  Transform  with  Shift) 

For  j  =  1, 2, . . . ,  n  —  1  repeat 

(a)  Overwrite  the  current  (i,i)  entry  fjj  with  dj  := 

(b)  Acting  on  rows  j  through  n  find  an  orthogonal  triangle  preserving 
matrix  Hj  such  that  Hjf.j  =  ej||/,j||  and  apply  Hj  to  those  rows. 

fn,n  •—  d„  :=  ^ fn,n  ~ 


Remark  1  An  efficient  way  to  apply  Hj  is  to  use  fast  Givens  rotations.  In  this  case  F  is 
held  as  two  arrays  and  F  where  F  is  unit  triangular  and  L  =  AX. 

Remark  2  The  transformation  alters  the  singular  values  and  the  left  singular  vectors  but 
the  right  singular  vectors  of  X  are  preserved  as  the  left  singular  vectors  of  X. 

Remark  3  Note  that  the  2n  x  2n  matrices  are  introduced  to  display  the  mathematical 
relationships.  In  practice  the  code  resembles  the  program  for  Algorithm  2  except  for 
the  non-orthogonal  modification  of  the  diagonal  entries.  It  is  preferable  to  use  triangle 
preserving  rotations. 

Remark  4  When  shifts  are  used  conclusion  (b)  in  Theorem  1  fails  but  the  other  two 
results  still  hold. 


Theorem  3  Let  X  denote  the  implicit  Cholesky  transform  of  X  with  shift  r(<  CTmin[L]). 
Let  di,  i  =  be  the  intermediate  diagonal  entries  described  in  Algorithm  4.  Then 

(E  <?’)-’ min 

i  =  l 


Proof:  Part  (a)  in  each  step  of  Algorithm  4  cannot  increase  any  singular  value  but  may 
decrease  some  of  them.  If  F  denotes  the  square  array  that  holds  L  initially  and  L*  finally 
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then  Part  (a)  is  equivalent  to  reducing  the  (j,j)  entry  of  FF*  by  r^.  At  the  end  of  Part 
(a)  of  Step  j  the  array  F  has  the  form  (shown  with  n  =  6,  j  =  4) 


/XX 

X 


J’(4)  = 


X  \ 

X 

X 


X  / 


,  Fil)  =  L  ,  i^(n) 


t 


Part  (b)  preserves  singular  values. 

Cn[L]  <  cr„[Fij)]  <  ||e;F(i)||  =  ||d,.e;||  =  dj. 

To  derive  the  first  inequality  return  to  the  2n  x  n  arrays  and  exact  orthogonal  transforms. 

Denote  by  Qk  the  product  of  all  the  orthogonal  transformations  that  deliver  F{k).  More¬ 
over  let  (xj  j/J)  denote  row  k  of  Qk.  Then  l|xi|p  =  1  -  ||ytl|^  <  1  and 

KL  =  {xl  j  =  €;F(k)  =  dkc;. 


Invert  and  transpose  to  find 

dk  ^Xk  =  L  *Uk, 

d;^\\xkr  =  [iL-L)-^]k,,. 

Thus 

^  =  trace[(Z*X)-^]. 

fc=i  *=i 

By  Theorem  2,  the  Newton  approximation  from  0  to  cr^^„[jL]  exceeds  iJ2k=ii  dk^)~^. 
Subtract  to  obtain  a  lower  bound  on  • 

Repetition  of  Algorithm  4  will  not  deliver  singular  vectors  because  after  two  applications 
both  right  and  left  singular  vectors  have  been  lost.  Algorithm  4  is  an  eificient  way  to 
find  a  few  small  singular  values  when  reduction  to  bidiagonal  form  is  not  warranted.  The 
next  section  shows  one  way  to  find  singular  vectors  and  use  shifts. 


9  Computation  of  Singular  Vectors  Using  Shifts 

Let  R  be  upper  triangular  with  positive  diagonal.  The  goal  is  to  compute  some  or  all  of 
R's  singular  values  and  the  corresponding  right  singular  vectors  Vi,V2,. .  .,Vn;  R*Rvj  = 
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VjO^.  If  any  matrix  M  is  multiplied  on  the  right  by  an  orthogonal  matrix  J  then  M's 
right  singular  vectors  are  transformed  by  J*  since 

Mj  =  c/EFv  =  ui:irvy. 

Each  step  in  Algorithm  5  makes  two  transformations;  one  of  them  modifies  the  vectors 
but  makes  no  shift,  the  other  preserves  the  (right)  vectors  but  reduces  the  singular  values. 


Algorithm  5  (Implicit  Shifted  Cholesky  for  Singular  Vectors  of  R) 

Set  V  =  J;  j  :=  n 
While  j  >  0  do 

Repeat  until  the  matrix  is  singular, 

(a)  Transform  R  into  lower  triangular  form  L  by  plane  rotations  on  the  right. 

(b)  Apply  the  transpose  of  each  rotation  to  V . 

(c)  Select  a  shift. 

(d)  Transform  L  into  R  by  plane  rotations  on  the  left  with  shift  t,  as  in  Algorithm  4- 
(the  right  singular  vectors  are  unchanged) 

j:=j -I 


Comment  If  the  initial  matrix  is  lower  triangular  then  begin  the  algorithm  with  a  (d) 
transformation. 


10  Choice  of  Shifts 


Choosing  a  useful  shift  in  LR  and  qd  type  algorithms  is  not  an  easy  task.  One  would  like 
the  shift  to  be  no  larger  than  Omin  but  too  much  caution  begets  sluggish  convergence.  If 
a  shift  is  too  large  then  one  or  more  entries  in  the  new  matrix  become  pure  imaginary 
and  at  the  next  step  entries  become  complex.  To  deal  with  this  situation  appears  to  be 
more  trouble  than  it  is  worth.  So,  at  present,  we  only  record  a  Cholesky  transform  when 
it  is  real.  Aggressive  shifts  that  result  in  late  failure  do  yield  excellent  new  shifts;  see 
Sections  10.3  and  11.3. 


10.1  Johnson  Shift 

A  lower  bound  for  smallest  singular  value  of  R  is  given  by 

5(l;i>-.vl+  E  |r,,.|))). 

^  i=l  k=j+l 

See  [12].  In  practice  we  may  confine  the  search  for  min^-  over  a  few  j  values  surrounding 
m  where  =  mint  dk-  This  heuristic  often  gives  a  good  result  but  it  does  not  guarantee 
a  lower  bound  for  the  smallest  singular  value. 


<rmiv[R]  >  inax[0,  min  {r^j  - 

j=l,n 
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10.2  Aggressive  Shifts 

We  expect  the  shift  strategy  for  these  algorithms  to  evolve  with  experience.  It  is  safe  to 
always  use  the  lower  bound  in  Theorem  3  but  it  seems  a  shame  to  make  no  use  of  the 
increasingly  accurate  upper  bound.  Failures  are  expensive  but  not  disastrous. 

Our  rather  simple  strategy  is  to  let  lo  =  (X)i=i  -  r^,  hi  =  min*  d*  and  set 

r  =  lo  +  a  *  (hi  -  lo) 

after  the  change  in  hi  is  less  than  5%.  A  shift  that  avoids  the  cost  of  the  Newton  shift 
(when  the  bandwidth  is  small,  perhaps)  is  to  save  the  previous  value  of  hi  as  oldhi  and 
set 

T  :=  2  *  hi  -  oldhi. 


10.3  Response  to  Failure 


Suppose  that  the  shift  r  >  Cmin \J-i\  and  that  the  Implicit  Cholesky  eilgorithm  fails  at  step 
k  with  l\  f.  <  0.  Let  us  write  the  relevant  matrices  in  partitioned  form 


n  -  T^I  = 


A  F 
F’  M 


where  A  is  (k-l)x  (A:  —  1)  and  positive  definite.  The  failure  at  step  k  of  implicit  Cholesky 
shows  that  the  Schur  complement  of  A 


W  =  M  -  F*A-'F 

has  Wi.i  =  1  <  0.  Consequently  W  has  negative  eigenvalues.  In  [23]  Wilkinson  showed 

that 

new  r  :=  \/r2  +  [W] 

is  a  safe,  and  accurate,  shift  for  L.  When  k  =  n  and  W  is  1  x  1  we  recover  Rutishauser’s 
late  failure  result.  For  the  analysis  of  this  case  see  Section  11.3. 

An  alternative  response  to  failure  at  step  k  (especially  when  k  is  small)  is  to  use  the  fact 
that  y/r'^  +  Wi^i  <  r  is  a  safe  shift  for  the  leading  k  x  k  submatrix  of  L*L  —  An 
application  of  the  implicit  Cholesky  transform  with  the  new  shift  will  fail,  if  at  all,  at 
some  step  after  k.  This  shift  is  less  work  than  the  one  Wilkinson  suggested  (especially 
when  k  ts  n/2)  but  is  not  guaranteed  to  give  success. 

The  cautious  response  to  early  failure  is  to  abandon  r  and  use  a  lower  bound  on  cr„i,„  as 
the  next  shift.  At  present  we  do  not  have  comparisons  of  the  effectiveness  of  these  two 
alternatives.  For  late,  or  almost  late  failure  (when  W  is  of  small  order)  Wilkinson’s  safe 
shift  seems  preferable. 
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11  Convergence 

11.1  Linear  Convergence 


We  have  stressed  the  fact  that  all  the  algorithms  that  flip  triangular  matrices  using 
one  sided  orthogonal  transformations  are  mathematically  equivalent  to  the  unshifted 
LR  Cholesky  algorithm  introduced  by  Rutishauser  in  [16].  Consequently  his  proofs  of 
convergence  suffice  to  cover  aU  these  variations  and  they  are  more  elegant  than  the  ones 
provided  in  these  more  recent  studies.  We  cannot  resist  the  temptation  to  show  how 
simple  the  arguments  can  be.  A  little  notation  is  needed.  Let 


R*  = 


5  0  \ 
HE) 


H 

E 


be  block  upper  triangular,  S  is  k  x  k,  and  let 


=QR 


with  Q*  =  There  is  no  need  to  partition  Q.  Since  premultiplication  by  an  orthogonal 
preserves  inner  products  between  columns  we  obtain  three  fundamental  relations: 


SS’  +  HH'  =  S'S 

(3) 

EE'  =  H'H  +  E'E 

(4) 

HE'  =  S'H. 

(5) 

Since  HH* 

and  H'H  are  positive  semi-definite  it  follows  that 

‘^.•[‘5']  <  ails’]  ,  1  =  1,..., A; 

(6) 

a,[R]  >cri[E]  ,  1  =  1, , . . ,  n  -  A: 

(7) 

and 

||R||  <  ||R||  IIS-^II  ||R||. 

(8) 

From  (6) 

(9) 

and  thus 

i|R||  <  IIRII  115-^11  ||R||. 

(10) 

These  results  hold  for  any  partition  of  R,  i.e.  any  admissible  choice  of  k.  Provided  that 
R  is  undecomposable,  e.g.  no  H  block  vanishes,  then  as  the  algorithm  proceeds  the  (1, 1) 
blocks  must  increase  and  (2,2)  blocks  must  decrease  when  measured  by  the  Frobenius 
norm.  It  follows  that  the  large  singular  values  must  migrate  to  the  (1, 1)  block  and  the 
small  ones  must  descend  to  the  (2,2)  block.  So  if  all  the  singular  values  are  distinct  then 

the  matrix  must  converge  to  S  =  diag(ai, . .  .,(T„).  However  in  the  presence  of  multiple 
singular  values,  and  finite  precision  arithmetic  too,  the  algorithm  need  not  converge. 


Equations  (8)  and  (10)  allow  us  to  estimate  the  rate  of  convergence.  If  the  partition  size 
k  is  chosen  so  that  there  is  a  gap,  cr*  >>  then  at  some  point  in  the  process 


ll^ll  <  1  (although  ||f;||  i|5-^||  > 

and  then  the  (1,2)  blocks  diminish  monotonically  in  norm.  Nevertheless  convergence  of 
||fr||  is  linear. 


Mathias  and  Stewart  give  a  sufficient  condition  for  monotonicity  to  set  in.  In  our  notation 
their  condition  is  that 

||.ff||  +  ||£ll<<T,[i2].  (11) 

To  see  why  this  suffices  consider  the  matrix 

/  0  S  0  0  \ 

a  H-  a  E 

VO  0  E*  Q  j 

whose  eigenvalues  are  ±£r,[.R],  i  =  1, . . . ,  n. 


By  the  Wielandt-Hoffmann  theorem  applied  to  M  the  singular  values  of  R  may  be  paired 
with  those  of  S  and  E  in  such  a  way  that  the  difference  of  any  pair  is  at  most  ||.H’||.  By 
(11) 


and  so  the  largest  k  singular  values  of  R  cannot  be  paired  with  any  from  E.  Thus  for 
some  j  <  k 


-  ll^ll  >  II 


(13) 


and  by  (11)  and  (13), 


ll^ll 

(rdR]-\\H\\ 


(14) 


When  this  stage  is  reached,  and  (11)  holds,  then  E’s  singular  values  approximate  the 
small  singular  values  of  R  to  0(\\n\\^). 


To  see  this  consider  the  symmetric  indefinite  matrix  M  given  above  and  consider  the 
subspace 


Range 


0 

0  \ 

0 

0 

/ 

0 

lo 

I ) 

The  projection  of  M  onto  this  subspace  is  represented  by 
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and  the  matrix  residual  F  is  given  by 


0 

0 

0 

0 

/  0  0  \ 

0  0 

0  0 

(  F\_ 

H  0 

I  0 

I  0 

1  fl*  0  /  “ 

0  0 

\0  I) 

VO  I) 

\  / 

0 

0 

Thus  ||F||  =  ||B^||.  K  7  is  the  gap  between  E's  singular  values  and  the  largest  k  singular 
values  of  R,  i.e.  7  =  -  ||JB||  then,  by  Chap.  11  of  [15], 

\aAE]-a,M\<\\Ffh=m?h  ,  i  = 


11.2  Quadratic  Convergence 

Sections  7  and  8  show  that  the  Newton  approximation  to  from  0  is  easily 

computed  at  the  end  of  the  LR  transform.  By  the  monotonicity  of  Newton’s  iteration 
from  outside  the  zero  set  we  know  that  r  is  a  safe  shift.  By  Theorem  3,  the  same  formula 
^  when  used  at  the  end  of  a  shifted  Cholesky  step,  stiU  provides  a  lower  bound 
on  OminlL]  but  a  poorer  one  than  Newton’s.  However,  shifts  are  non-restoring  in  the 
implicit  Cholesky  process  and  consequently  the  shifts  will  tend  to  0  and  the  shift  formula 
will  degrade  only  a  little.  Nevertheless  we  have  not  proved  quadratic  convergence  for  this 
shift  formula  in  Algorithm  4. 

On  the  other  hand  Algorithm  5  does  use  Newton’s  shift  and  convergence  is  quadratic  but 
each  step  costs  twice  as  much  as  a  step  of  Algorithm  4.  However  the  goals  of  these  two 
algorithms  are  not  the  same. 


11.3  Higher  Order  Convergence 

Rutishauser  observed  that  when  the  matrix  has  nearly  revealed  the  smallest  singular  value 
in  the  (n,  n)  position  then  there  is  a  shift  strategy  that  generates  cubic  convergence.  The 
key  requirement  is  what  he  calls  late  breakdown.  Suppose  that  r  =  „  exceeds  OminiL] 

by  so  little  that  a  negative  pivot  occurs  only  at  the  last  step.  It  turns  out  that 
yjln.n  +  ^  ®^f®  accurate  shift  for  L.  So  the  strategy  consists  of  making  a  copy 

of  i,  transforming  with  shift  and  then  discarding  intermediate  quantities  except  for 
Then  transform  L  with  shift  ln,n)- 

Neither  Rutishauser  nor  Wilkinson  [23]  emphasize  that  to  employ  this  strategy  it  is 
necessary  to  save  L  and  discard  aU  the  intermediate  quantities  arising  from  using  the 
shift  Thus  the  cost  of  finding  the  good  shift  is  one  whole  iteration.  Perhaps  that  is 
why  this  shift  is  more  talked  about  than  used.  In  fact  there  is  a  small  interval  of  shifts 
^n,n)  which  produce  a  rate  of  convergence  at  least  as  good  as  from 

We  present  here  the  simple  derivation  in  the  case  where  the  initial  shift  r  is  taken  as 
because  the  proof  of  the  more  general  result  is  messier. 


V 
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Write  L  in  partitioned  form  as 


0 


The  assumption  is  that  amin  =  «^nin[T]  <  (^min[L]-  Consequently  the  last  pivot  in  the 
triangular  factorization  oi  L*L  —  must  vanish.  This  gives 

Fact  1.  =  0 

where  V  :=  L*L  +  bb*. 


When  the  shift  r  =  /„  „  is  used  the  late  failure  assumption  says  that  the  last  pivot  is  the 
only  negative  one.  This  gives 

Fact  2.  d^=  0  -  ll„b'[V  -  llJ„-i]-^b  <  0. 


Hence 

/n,n  +  dl  =  +  ll„mv  -  -  b'[V  - 

By  Hilbert’s  first  resolvent  identity  (see  p  90  of  [3]) 

lln  +  dl  =  "  ^InWlV  - 

Now  use  Fact  1  again  to  find  that 


It  is  more  informative  to  write  6  =  ||6||6  so  that 

ln,r.  +  dl  =  -  (/„,„||h||r6*[F  -  al,J„.,]-^b{b[V  -  -  llJ„_,]-^b}. 

The  quantities  involving  b  are  bounded  by  l/(cr2_JI,]  -  al[L]f  as  /„,„||61|  -»•  0.  • 


Although  this  result  is  interesting  we  doubt  that  the  trial  shift  is  cost  effective  in  the 
full  triangular  case.  Wilkinson  and  Rutishauser  did  not  know  the  bounds  on  cr„[X]  given 
by  Theorem  3,  and  so  we  can  use  more  refined  strategies  than  they  did. 


12  A  Preconditioner  for  Jacobi 


Jacobi  invented  his  famous  eigenvalue  algorithm  (given  in  [11])  as  a  preconditioner  for 
what  we  now  call  the  point  Jacobi  method  for  solution  of  linear  (symmetric)  equations. 
See  [9],  [lOj.  One  might  ask  the  wisdom  of  designing  a  preconditioner  for  another  “precon¬ 
ditioner”.  The  answer  is  that  the  classical  Jacobi  method  in  which  the  largest  offdiagonal 
element  is  chosen  as  the  doomed  element  is  no  longer  the  preferred  version  for  computa¬ 
tion  of  eigenvalues.  Instead  the  row  (or  column)  cyclic  annihilations  are  used  in  Jacobi 
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algorithms  and  this  strategy  does  not  have  good  initial  convergence  properties,  see  Fer¬ 
nando  [6],  although  asymptotically  the  convergence  is  quadratic.  Hence  the  use  of  a 
preconditioner  for  cyclic  Jacobi  is  not  a  futile  effort. 

The  flipping  algorithm  (Algorithm  2)  has  a  non-trivial  application  in  one-sided  Jacobi 
algorithms  belonging  to  the  Hestenes  family,  see  [8].  Veselic  and  Hari  [21]  have  shown 
that  one  step  of  the  Cholesky  algorithm  with  diagonal  pivoting  helps  the  convergence 
of  Hestenes  algorithm  for  some  cases.  That  is,  the  Cholesky  algorithm  can  be  used  to 
precondition  the  Hestenes  type  SVD  algorithms.  By  recognizing  that  the  LR  algorithm 
can  be  implemented  using  orthogonal  or  unitary  transformations,  it  is  possible  to  execute 
extra  preconditioning  steps.  Jacobi  algorithms  are  more  appropriate  at  the  later  stages 
as  they  possess  quadratic  convergence  whereas  LR  algorithms  without  shifts  are  linearly 
convergent,  so  only  a  few  preconditioning  steps  should  be  used.  It  is  not  yet  known 
whether  Jacobi  will  beat  implicit  Cholesky  with  shifts.  This  preconditioner  can  be  also 
used  for  the  Jacobi  method  due  to  Kogbetliantz  for  computation  of  the  SVD.  If  the 
matrix  is  triangular,  the  Kogbetliantz  method  retains  that  structure  while  Hestenes  does 
not.  See  Fernando  [6]. 

Let  us  indicate  how  the  preconditioning  step  may  be  applied  more  than  once.  Let  A  be 
symmetric  positive  definite.  The  authors  mentioned  above  perform  a  Cholesky  factoriza¬ 
tion  with  diagonal  pivoting  to  produce  L  where 

P'AP  =  LL\ 

The  one-sided  Jacobi  algorithm  applies  a  sequence  of  plane  rotations,  on  the  right,  to 
an  array  F  which  is  initialized  to  either  X  or  Z*.  Each  rotation  makes  the  associated 
columns  of  F  orthogonal.  On  exit  the  norms  of  F's  columns  are  the  singular  values  of 
L  and  the  columns  themselves  are  eigenvectors  of  A.  Veselic  and  Hari  observed  that 
initializing  F  to  X,  instead  of  X*,  is  equivalent  to  performing  Jacobi  on  X*X  rather  than 
XX*  =  A.  In  other  words  the  Jacobi  process  is  preconditioned  (implicitly)  by  one  LR 
transform.  This  preconditioning  sometimes  improves  convergence  dramatically.  Why 
not  do  another  step  of  preconditioning  ?  Veselic  and  Hari  were  unwilling  to  discard  the 
accuracy  inherent  in  Jacobi  by  explicitly  forming  X*X  in  order  to  compute  the  Cholesky 
factor  X  (X*X  =  ii*)  but  as  explained  in  this  paper,  there  is  no  need  for  explicit  multi¬ 
plication.  Algorithm  2  yields  X,  without  forming  X*X,  using  plane  rotations  exclusively. 
Now  F  may  be  initialized  to  X  instead  of  X.  The  process  may  be  repeated  and  in  fact, 
should  be. 

Let  us  mention  a  subtle  point  concerning  preconditioning  for  a  right  sided  Jacobi  process 
applied  to  a  matrix  F,  Let  A  =  XX*  and  let  X  =  UHV".  If  /’  is  initialized  to  X*  then, 
on  exit,  the  columns  of  F  when  normalized  give  V.  U  F  is  initialized  to  X  then,  on 
exit,  the  columns  of  F  yield  U.  If  we  do  an  extra  preconditioning  step  from  L  to  L 
then  initializing  F  to  L  will  eventually  yield  V.  Another  preconditioning  step  L  L 
and  initializing  F  to  L  will  eventually  yield  U  and  again  there  is  no  need  to  accumulate 
rotations  and  this  feature  saves  both  storage  and  arithmetic  operations.  However,  the 
accuracy  and  especially  the  orthogonality  of  the  vectors  computed  this  way  might  be 
poorer  than  for  the  accumulated  vectors.  Nevertheless,  the  moral  of  the  story  is  that  it 
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pays  to  use  an  even  number  of  preconditioning  steps  for  one  vector  set,  an  odd  number 
for  the  other. 

For  those  who  are  appalled  at  the  thought  of  preceding  a  Jacobi  process  by  a  non- 
orthogonal  transformation  A  ->•  LL*,  there  is  a  remedy.  Compute  the  QR  factorization 
of  Hermitian  A,  ignoring  A's  symmetry  but  using  column  pivoting  or  some  other  rank 
revealing  permutation.  Then  initialize  the  matrix  F  Xo  R  or  R*.  In  exact  arithmetic 
this  preconditioning  is  equivalent  to  using  the  Cholesky  factor  of  The  advantages  of 
this  more  expensive  preconditioner  is  that  there  is  no  need  for  A  to  be  positive  definite. 
Furthermore,  the  QR  step  can  be  done  in  parallel  machines,  including  systolic  arrays, 
while  the  Cholesky  step  cannot  be  executed  so  easily  in  parallel.  If  the  matrix  is  not  sign 
definite,  the  signs  of  i4’s  eigenvalues  can  be  recovered  on  exit  by  using  the  eigenvectors 
formed  in  the  Jacobi  process.  Convergence  will  be  much  faster  than  when  the  precondi¬ 
tioner  is  a  Cholesky  factor  because  the  convergence  factors  are  squared  as  our  numerical 
examples  reveal. 

In  the  above  algorithm,  we  assumed  that  the  matrix  is  Hermitian.  If  the  matrix  is 
skew- Hermitian,  the  same  algorithm  can  be  used  by  noting  that  now  the  eigenvalues  are 
conjugate  imaginary  pairs  whose  magnitudes  are  given  by  the  singular  values.  So  there 
is  no  concern  about  recovering  the  signs  of  the  eigenvalues. 

Once  we  have  taken  the  fateful  step  of  contemplating  a  preconditioner  for  a  Jacobi  process 
we  are  lead  inexorably  to  the  message  of  this  paper.  Why  not  use  the  implicit  Cholesky 
algorithm  with  shifts  as  a  preconditioner?  There  is  no  loss  of  accuracy.  The  next  question 
is:  if  the  shifts  are  well  chosen  why  switch  to  Jacobi?  Time  will  tell. 


13  Numerical  Examples 


The  goals  of  this  study  are  (i)  to  point  out  the  feasibility  and  the  desirability  of  shifting  in 
the  impbcit  Cholesky  algorithm,  (ii)  to  present  new  computable  upper  and  lower  bounds 
on  singular  values.  We  are  not  presenting  a  particular  implementation  and  so  content 
ourselves  with  offering  some  simple  illustrations. 

Two  types  of  triangular  matrix  were  used  as  input.  Any  symmetric  positive  definite 
matrix  A  may  be  decomposed  as  A  =  LL'  and  as  A  =  QR.  Thus  R  is  the  Cholesky 
factor  of  A^  and  its  singular  values  are  the  squares  of  those  of  L* .  We  present  results 
on  triangular  matrices  arising  from  a  10  x  10  reverse  Hilbert  matrix  and  from  a  20  x  20 
ToepHtz  matrix  {A{i,j)  =  n  -  |i  -  j|,  t  =  1, . . j  =  i, . . . , n). 

For  each  triangular  matrix  we  tried  to  present  the  results  of  3  shifting  strategies:  no  shift, 
Newton  shift,  and  a  simple  aggressive  shift  described  in  Section  10.2. 
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Before  commenting  on  the  results  we  discuss  our  stopping  criterion.  Let 


R  = 


S  h\ 
O  E  ) 


be  the  current  matrix.  We  declare  H  negligible  when 

||.ff||  <  tol  *  macheps  *  ||original  iZ||. 

For  the  examples  given  here  E  was  1x1  and  ||  •  ||  =  ||  •  Hoc-  Sometimes  convergence 
would  occur  sooner  if  we  allowed  E  to  have  larger  dimension  but  when  is  1  x  1  we  do 
learn  the  number  of  steps  needed  to  find  the  singular  values  in  order. 

We  began  with  iol  =  1  (i.e,  maximal  absolute  accuracy)  and  used  a  Sun  Sparc  Work¬ 
station  ELC  with  macheps  =  2  x  10“^®.  With  no  shift  the  algorithm  failed  to  converge 
after  2  minutes  on  the  Toeplitz  example.  It  still  failed  with  tol  —  10®  so  we  ended  up 
using  tol  =  10®  just  to  get  a  comparison,  see  Tables  1  and  2.  Even  with  this  value  of  tol 
the  Cholesky  factor  of  the  Toeplitz  matrix  required  1179  steps  with  no  shift  as  compared 
with  110  for  the  Newton  shift  and  84  for  the  aggressive  shift. 

The  Toeplitz  example,  see  Tables  1,  2,  3,  and  4,  shows  clearly  the  potential  benefits  of 
shifting.  The  smaller  singular  values,  see  Tables  9  and  10,  are  not  very  small  and  their 
ratios  are  not  very  small  either.  On  the  other  hand  the  differ  in  the  second 

decimal  so  that  they  could  not  be  considered  clustered.  As  is  clear  from  Tables  1  and  2  the 
(19,20)  entry  is  the  last  to  become  negligible.  In  fact,  with  no  shift,  the  singular  values 
converge  in  the  order  ai,  <T2,  . .  .,020-  However  our  purpose  is  to  show  how  shifts  can  help 
and  465  steps  are  needed  to  make  the  (19,20)  entry  go  below  10®  *  macheps  *  norm. 
In  contrast  the  Cholesky  factor  of  the  reverse  Hilbert  segment  has  much  smaller  vcdues 
for  see  Tables  13  and  14,  and  the  improvements  from  shifting  are  much  less 

dramatic,  see  Tables  7  and  8.  Tables  3,  4,  5,  and  6  show  two  things:  the  potential 
power  of  good  shifts  and  the  slow  down  caused  by  the  poorer  ratios  associated  with  the 
Cholesky  triangle,  see  Tables  9,  10,  11,  and  12. 

The  quality  of  our  bounds  on  Cmin  is  illustrated  in  Figs.  1  and  3.  Recall  that  in  Algo¬ 
rithm  5  a  shift  is  only  used  on  alternate  flips  when  singular  vectors  are  requested.  The 
lower  bound  in  the  shifted  flip  is  poorer  than  in  the  unshifted  one  and  that  accounts  for 
the  step  like  appearance  of  the  lower  bound  in  Fig.  1. 

Note  that  the  upper  bound  is  a  much  better  approximation  than  the  lower  one.  Finally, 
Figs.  2  and  4,  we  give  a  snapshot  of  the  way  in  which  ||.H^||oo  declines  under  two  shift 
strategies. 

Even  these  preliminary  results  suggest  that  implicit  Cholesky  with  good  shifts  is  not  just 
a  tool  for  low  accuracy  applications  but  can  be  used  in  high  accuracy,  LAPACK  style, 
mode  as  well.  It  may  be  the  method  of  choice  whenever  reduction  to  bidiagonal  form  is 
too  much  trouble. 
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Toepiitz  Triangle:  QR  Mode:  Vectors 

i 

20 

19 

18 

17 

14 

13 

12 

11 

No  Shift 

455 

456 

461 

462 

463 

464 

Newton 

35 

36 

45 

52 

58 

60 

61 

66 

69 

70 

Aggressive 

17 

18 

25 

31 

37 

38 

39 

45 

49 

54 

Table  1;  Steps  needed  to  find  {<Tn, . . .  ,<»■»}»  =  10*- 


Toepiitz  Triangle:  QR  Mode:  Vectors  (cont.) 

10 

9 

8 

7 

6 

5 

4 

3 

2 

No  Shift 

465 

466 

471 

MSEM 

Newton 

71 

72 

73 

74 

75 

76 

77 

78 

79 

79 

57 

58 

59 

60 

61 

62 

63 

64 

65 

65 

Table  2:  Steps  needed  to  find  {«rn, . . .  >  <r<},  tol  =  10®. 


Toepiitz  Triangle:  QR  Mode:  Vectors 

i 

13 

12 

11  1 

Newton 

39 

46 

118 

to 

Aggressive 

19 

26 

68 

74 

79  1 

Table  3:  Steps  needed  to  find  {«rn,  •  •  •  tol  =  1. 


Toepiitz  Triangle:  QR  Mode:  Vectors  (cont.) 

10 

9 

8 

6 

5 

4 

3 

Hll 

Newton 

130 

131 

136 

Aggressive 

84 

88 

93 

94 

97 

98 

99 

Table  4:  Steps  needed  to  find  {<rn, . . . ,  o-^},  to/  =  1. 


Toeplitz  Triangle:  Cholesky  Mode:  Vectors 


i 

20 

19 

16 

15 

14 

13 

12 

11 

Newton 

54 

62 

76 

88 

101 

102 

148 

Aggressive 

20 

29 

37 

44 

53 

54 

62 

69 

76 

82 

Table  5:  Steps  needed  to  find  {tTm ...» tol  =  1. 


Toeplitz  Triangle:  Cholesky  Mode:  Vectors  (cont.) 


10 


8 


Newton 


158 


164 


171 


172 


173 


174 


175 


176 


177 


177 


Aggressive  89  95  100  102  108  112  115  116  117  117 


Table  6:  Steps  needed  to  find  {trn, . . . ,  tri},  tol  =  1. 


Reverse  Hilbert  Triangle:  QR  Mode:  Vectors 

10 

o 

B 

B 

6 

5 

4 

3 

m 

No  Shift 

3 

Q 

8 

13 

31 

Newton 

3 

Q 

8 

11 

13 

23 

29 

29 

Aggressive 

3 

Q 

B 

11 

13 

16 

19 

23 

Table  7:  Steps  needed  to  find  ,  tr^},  tol  =  1. 


Reverse  Hilbert  Triangle:  Cholesky  Mode:  Vectors 


i 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

No  Shift 

10 

14 

17 

21 

53 

53 

Newton 

5 

21 

25 

29 

46 

46 

5 

11 

16 

20 

24 

28 

32 

36 

41 

41 

Table  8:  Steps  needed  to  find  {o-n, . . . ,  o-j},  toZ  =  1. 


Toeplitz  Triangle:  QR 

i 

20 

19 

18 

17 

16 

15 

14 

13 

12 

11 

Values 

0.503 

0.512 

0.529 

0.552 

0.586 

0.629 

0.688 

0.762 

0.865 

0.995 

Ratio 

0.982 

0.969 

0.957 

0.943 

0.931 

0.915 

0.903 

0.881 

0.869 

0.839 

Table  9:  Singular  Values 


Toeplitz  Triangle:  QR  (cont.) 

i 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

Values 

1.19 

1.43 

1.83 

2.38 

3.41 

5.00 

9.17 

17.2 

81.2 

270. 

Ratio 

0.827 

0.783 

0.769 

0.697 

0.682 

0.545 

0.532 

0.212 

0.300 

Table  10:  Singular  Values 


Toeplitz  Triangle:  Cholesky 

i 

20 

19 

18 

17 

16 

15 

14 

13 

12 

11 

Values 

0.709 

0.716 

0.727 

0.743 

0.765 

0.793 

0.829 

0.873 

0.930 

0.998 

Ratio 

0.991 

0.984 

0.978 

0.971 

0.965 

0.956 

0.950 

0.939 

0.932 

0.916 

Table  11:  Singular  Values 


Toeplitz  Triangle:  Cholesky  (cont.) 

i 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

Values 

1.09 

1.20 

1.35 

1.54 

1.85 

2.24 

3.03 

K&9 

9.01 

16.4 

0.909 

0.885 

0.877 

0.835 

0.826 

0.739 

0.730 

0.548 

Table  12:  Singular  Values 


Reverse  Hilbert  Triangle;  QR 

i 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

Values 

.lle-12 

.23e-10 

.21e-8 

.12e-6 

.47e-5 

.13e-3 

.25e-2 

.36e-l 

.34 

1.8 

Ratio 

.48e-02 

.lle-01 

.17e-l 

.26e-l 

.37e-l 

.51e-l 

.71e-l 

.10 

IB 

Table  13:  Singular  Values 


Reverse  Hilbert  Triangle:  Cholesky 

i 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

Values 

.33e-6 

.48e-05 

.46e-4 

mi 

Ratio 

.69e-l 

.10 

.13 

.16 

.19 

.23 

.27 

.32 

.44 

■11 

Table  14:  Singular  Values 


Fig.  4  Decline  of  max  element  in  last  column,  tol  =  le8 
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